Goto

Collaborating Authors

 cognitive bias


Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Yang, Xikang, Zhou, Biyu, Tang, Xuehai, Han, Jizhong, Hu, Songlin

arXiv.org Artificial Intelligence

Large Language Models (LLMs) demonstrate impressive capabilities across a wide range of tasks, yet their safety mechanisms remain susceptible to adversarial attacks that exploit cognitive biases -- systematic deviations from rational judgment. Unlike prior jailbreaking approaches focused on prompt engineering or algorithmic manipulation, this work highlights the overlooked power of multi-bias interactions in undermining LLM safeguards. We propose CognitiveAttack, a novel red-teaming framework that systematically leverages both individual and combined cognitive biases. By integrating supervised fine-tuning and reinforcement learning, CognitiveAttack generates prompts that embed optimized bias combinations, effectively bypassing safety protocols while maintaining high attack success rates. Experimental results reveal significant vulnerabilities across 30 diverse LLMs, particularly in open-source models. CognitiveAttack achieves a substantially higher attack success rate compared to the SOTA black-box method PAP (60.1% vs. 31.6%), exposing critical limitations in current defense mechanisms. These findings highlight multi-bias interactions as a powerful yet underexplored attack vector. This work introduces a novel interdisciplinary perspective by bridging cognitive science and LLM safety, paving the way for more robust and human-aligned AI systems.


Bias in the Loop: How Humans Evaluate AI-Generated Suggestions

Beck, Jacob, Eckman, Stephanie, Kern, Christoph, Kreuter, Frauke

arXiv.org Machine Learning

Human-AI collaboration increasingly drives decision-making across industries, from medical diagnosis to content moderation. While AI systems promise efficiency gains by providing automated suggestions for human review, these workflows can trigger cognitive biases that degrade performance. We know little about the psychological factors that determine when these collaborations succeed or fail. We conducted a randomized experiment with 2,784 participants to examine how task design and individual characteristics shape human responses to AI-generated suggestions. Using a controlled annotation task, we manipulated three factors: AI suggestion quality in the first three instances, task burden through required corrections, and performance-based financial incentives. We collected demographics, attitudes toward AI, and behavioral data to assess four performance metrics: accuracy, correction activity, overcorrection, and undercorrection. Two patterns emerged that challenge conventional assumptions about human-AI collaboration. First, requiring corrections for flagged AI errors reduced engagement and increased the tendency to accept incorrect suggestions, demonstrating how cognitive shortcuts influence collaborative outcomes. Second, individual attitudes toward AI emerged as the strongest predictor of performance, surpassing demographic factors. Participants skeptical of AI detected errors more reliably and achieved higher accuracy, while those favorable toward automation exhibited dangerous overreliance on algorithmic suggestions. The findings reveal that successful human-AI collaboration depends not only on algorithmic performance but also on who reviews AI outputs and how review processes are structured. Effective human-AI collaborations require consideration of human psychology: selecting diverse evaluator samples, measuring attitudes, and designing workflows that counteract cognitive biases.



Investigating the Effects of Cognitive Biases in Prompts on Large Language Model Outputs

Sun, Yan, Kok, Stanley

arXiv.org Artificial Intelligence

This paper investigates the influence of cognitive biases on Large Language Models (LLMs) outputs. Cognitive biases, such as confirmation and availability biases, can distort user inputs through prompts, potentially leading to unfaithful and misleading outputs from LLMs. Using a systematic framework, our study introduces various cognitive biases into prompts and assesses their impact on LLM accuracy across multiple benchmark datasets, including general and financial Q&A scenarios. The results demonstrate that even subtle biases can significantly alter LLM answer choices, highlighting a critical need for bias-aware prompt design and mitigation strategy. Additionally, our attention weight analysis highlights how these biases can alter the internal decision-making processes of LLMs, affecting the attention distribution in ways that are associated with output inaccuracies. This research has implications for Al developers and users in enhancing the robustness and reliability of Al applications in diverse domains.


AI Biases as Asymmetries: A Review to Guide Practice

Waters, Gabriella, Honenberger, Phillip

arXiv.org Artificial Intelligence

AI Biases as Asymmetries: A Review to Guide Practice Gabriella Waters (CEAMLS, Morgan State University)* Phillip Honenberger (CEAMLS, Morgan State University)* *Equal contribution [Preprint - Nov. 21, 2024] Abstract The understanding of bias in AI is currently undergoing a revolution. Initially understood as errors or flaws, biases are increasingly recognized as integral to AI systems and sometimes preferable to less biased alternatives. In this paper we review the reasons for this changed understanding and provide new guidance on two questions: First, how should we think about and measure biases in AI systems, consistent with the new understanding? Second, what kinds of bias in an AI system should we accept or even amplify, and what kinds should we minimize or eliminate, and why? The key to answering both questions, we argue, is to understand biases as "violations of a symmetry standard" (following Kelly). We distinguish three main types of asymmetry in AI systems - error biases, inequality biases, and process biases - and highlight places in the pipeline of AI development and application where bias of each type is likely to be good, bad, or inevitable. Introduction The understanding of bias in AI is currently undergoing a revolution. Initially perceived as errors or flaws, biases are increasingly recognized as integral to AI systems and sometimes preferable to less biased alternatives. Cognitive psychology and statistics have informed this shift, highlighting the benefits and costs of biases in decision-making processes. Cognitive psychology presents biases as often helpful in making decisions under conditions of uncertainty. Similarly, statistical methods acknowledge biases as often useful and sometimes necessary for making inferences from data. These insights have been instrumental in redefining biases as not inherently negative, but as sometimes essential components that can and should be harnessed to improve AI systems.


Cognitive Bias Detection Using Advanced Prompt Engineering

Lemieux, Frederic, Behr, Aisha, Kellermann-Bryant, Clara, Mohammed, Zaki

arXiv.org Artificial Intelligence

Cognitive biases, systematic deviations from rationality in judgment, pose significant challenges in generating objective content. This paper introduces a novel approach for real-time cognitive bias detection in user-generated text using large language models (LLMs) and advanced prompt engineering techniques. The proposed system analyzes textual data to identify common cognitive biases such as confirmation bias, circular reasoning, and hidden assumption. By designing tailored prompts, the system effectively leverages LLMs' capabilities to both recognize and mitigate these biases, improving the quality of human-generated content (e.g., news, media, reports). Experimental results demonstrate the high accuracy of our approach in identifying cognitive biases, offering a valuable tool for enhancing content objectivity and reducing the risks of biased decisionmaking. Introduction Cognitive biases are systematic patterns of deviation from rational judgment, affecting decision-making processes across various domains, including media, policy-making, and legal reasoning. With the rapid expansion of artificial intelligence (AI) applications, large language models (LLMs) have demonstrated significant potential in processing and evaluating vast amounts of textual information. However, existing research has largely focused on mitigating biases within AI-generated outputs rather than leveraging AI to detect biases in human-generated content. This gap presents a critical challenge in ensuring transparency and fairness in AI-assisted decision-making. This study explores the application of structured prompt engineering as a novel approach to improving LLM accuracy in detecting cognitive biases.


Investigating the Impact of LLM Personality on Cognitive Bias Manifestation in Automated Decision-Making Tasks

He, Jiangen, Liu, Jiqun

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly used in decision-making, yet their susceptibility to cognitive biases remains a pressing challenge. This study explores how personality traits influence these biases and evaluates the effectiveness of mitigation strategies across various model architectures. Our findings identify six prevalent cognitive biases, while the sunk cost and group attribution biases exhibit minimal impact. Personality traits play a crucial role in either amplifying or reducing biases, significantly affecting how LLMs respond to debiasing techniques. Notably, Conscientiousness and Agreeableness may generally enhance the efficacy of bias mitigation strategies, suggesting that LLMs exhibiting these traits are more receptive to corrective measures. These findings address the importance of personality-driven bias dynamics and highlight the need for targeted mitigation approaches to improve fairness and reliability in AI-assisted decision-making.


Bias Beware: The Impact of Cognitive Biases on LLM-Driven Product Recommendations

Filandrianos, Giorgos, Dimitriou, Angeliki, Lymperaiou, Maria, Thomas, Konstantinos, Stamou, Giorgos

arXiv.org Artificial Intelligence

The advent of Large Language Models (LLMs) has revolutionized product recommendation systems, yet their susceptibility to adversarial manipulation poses critical challenges, particularly in real-world commercial applications. Our approach is the first one to tap into human psychological principles, seamlessly modifying product descriptions, making these adversarial manipulations hard to detect. In this work, we investigate cognitive biases as black-box adversarial strategies, drawing parallels between their effects on LLMs and human purchasing behavior. Through extensive experiments on LLMs of varying scales, we reveal significant vulnerabilities in their use as recommenders, providing critical insights into safeguarding these systems.


Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments

Sumita, Yasuaki, Takeuchi, Koh, Kashima, Hisashi

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are trained on large corpora written by humans and demonstrate high performance on various tasks. However, as humans are susceptible to cognitive biases, which can result in irrational judgments, LLMs can also be influenced by these biases, leading to irrational decision-making. For example, changing the order of options in multiple-choice questions affects the performance of LLMs due to order bias. In our research, we first conducted an extensive survey of existing studies examining LLMs' cognitive biases and their mitigation. The mitigation techniques in LLMs have the disadvantage that they are limited in the type of biases they can apply or require lengthy inputs or outputs. We then examined the effectiveness of two mitigation methods for humans, SoPro and AwaRe, when applied to LLMs, inspired by studies in crowdsourcing. To test the effectiveness of these methods, we conducted experiments on GPT-3.5 and GPT-4 to evaluate the influence of six biases on the outputs before and after applying these methods. The results demonstrate that while SoPro has little effect, AwaRe enables LLMs to mitigate the effect of these biases and make more rational responses.


Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning

Chadimová, Milena, Jurášek, Eduard, Kliegr, Tomáš

arXiv.org Artificial Intelligence

This paper introduces a novel method, referred to as "hashing", which involves masking potentially bias-inducing words in large language models (LLMs) with hash-like meaningless identifiers to reduce cognitive biases and reliance on external knowledge. The method was tested across three sets of experiments involving a total of 490 prompts. Statistical analysis using chi-square tests showed significant improvements in all tested scenarios, which covered LLama, ChatGPT, Copilot, Gemini and Mixtral models. In the first experiment, hashing decreased the fallacy rate in a modified version of the "Linda" problem aimed at evaluating susceptibility to cognitive biases. In the second experiment, it improved LLM results on the frequent itemset extraction task. In the third experiment, we found hashing is also effective when the Linda problem is presented in a tabular format rather than text, indicating that the technique works across various input representations. Overall, the method was shown to improve bias reduction and incorporation of external knowledge. Despite bias reduction, hallucination rates were inconsistently reduced across types of LLM models. These findings suggest that masking bias-inducing terms can improve LLM performance, although its effectiveness is model- and task-dependent.